Cycled clustering for redundancy coded data storage systems

ABSTRACT

A cluster of data transfer devices is used to augment the capabilities of a data storage system. For example, the cluster of data transfer devices may be configured to store a portion of a bundle of redundancy coded shards in a similar fashion as a data storage system. As another example, the cluster may be configured to provide other capabilities incident to the devices used, such as computational capabilities. Data stored on the cluster may be read from and written directly to the cluster without transfer of data to the data storage system. In some embodiments, a connecting entity (such as a customer entity) may interchangeably interface with the data storage system and the cluster, and the requested capabilities may be directed to either in a fashion that is transparent to the requestor.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application incorporates by reference for all purposes the fulldisclosure of co-pending U.S. patent application Ser. No. 14/788,671,filed Jun. 30, 2015, entitled “SHIPPABLE NETWORK-ATTACHED DATA STORAGEDEVICE WITH UPDATEABLE ELECTRONIC DISPLAY,” co-pending U.S. patentapplication Ser. No. 14/789,783, filed Jul. 1, 2015, entitled “GRIDENCODED DATA STORAGE SYSTEMS FOR EFFICIENT DATA REPAIR,” co-pending U.S.patent application Ser. No. 14/741,409, filed Jun. 16, 2015, entitled“ADAPTIVE DATA LOSS MITIGATION FOR REDUNDANCY CODING SYSTEMS,”co-pending U.S. patent application Ser. No. 15/083,115, filedconcurrently herewith, entitled “LOCAL STORAGE CLUSTERING FOR REDUNDANCYCODED DATA STORAGE SYSTEM,” and co-pending U.S. patent application Ser.No. 15/083,145, filed concurrently herewith, entitled “HYBRIDIZEDSTORAGE OPERATION FOR REDUNDANCY CODED DATA STORAGE SYSTEMS.”

BACKGROUND

Modern computer systems make extensive use of network computing andnetwork data storage systems. Such use has proliferated in recent years,particularly in distributed or virtualized computer systems wheremultiple computer systems may share resources when performing operationsand tasks associated with the computer systems. Such computer systemsfrequently utilize distributed data storage in multiple locations tostore shared data items so that such data items may be made available toa plurality of consumers. The resources for network computing andnetwork data storage are often provided by computing resource providerswho leverage large-scale networks of computers, servers, and storagedrives to enable customers to host and execute a variety of applicationsand web services. The usage of network computing and network datastorage allows customers to efficiently and to adaptively satisfy theirvarying computing needs, whereby the computing and data storageresources that may be required by the customers are added or removedfrom a large pool provided by a computing resource provider as needed.

The proliferation of network computing and network data storage, as wellas the attendant increase in the number of entities dependent on networkcomputing and network data storage, have increased the frequency andamplitude of demand spikes, and in some cases, such demand spikes arenot easily predicted. Database services optimized to scale for certaintypes of increased demand, such as payload size, may not necessarily becapable of handling demand on a different access, such as requestedtransaction rate.

BRIEF DESCRIPTION OF THE DRAWINGS

Various techniques will be described with reference to the drawings, inwhich:

FIG. 1 illustrates an example environment in which a data storage systeminterfaces with data transfer devices to store a set of redundancy codedshards, in accordance with some embodiments;

FIG. 2 illustrates an example environment in which a plurality of datatransfer devices may be implemented to provide scalable data services,in accordance with some embodiments;

FIG. 3 illustrates an example environment in which a plurality of datatransfer devices are cycled to improve local availability and/ordurability, in accordance with some embodiments;

FIG. 4 illustrates an example process for storing redundancy coded datain a hybridized data storage system, in accordance with someembodiments;

FIG. 5 illustrates an example process for scalably provisioning aplurality of data transfer devices to provide data-relatedfunctionality, in accordance with some embodiments;

FIG. 6 illustrates an example process for cycling data transfer devicesprovisioned to an entity remote from a data storage system, inaccordance with some embodiments;

FIG. 7 illustrates an example environment in which a computing resourceservice provider implements a data storage service, such as a gridstorage service, to process and store data transacted therewith, inaccordance with some embodiments;

FIG. 8 illustrates an example environment where a redundancy encodingtechnique is applied to data stored in durable storage in accordancewith at least one embodiment;

FIG. 9 illustrates an example environment where a redundancy encodingtechnique is applied to data stored in durable storage in accordancewith at least one embodiment;

FIG. 10 illustrates an example process for applying redundancy encodingtechniques to data stored in durable storage in accordance with at leastone embodiment; and

FIG. 11 illustrates an environment in which various embodiments can beimplemented.

DETAILED DESCRIPTION

In one example, data stored on a data storage system, is redundancycoded so as to improve durability, availability, and other aspects ofthe storage thereof. In connection with the storage of the data, one ormore data transfer devices may be provisioned to locations external tothe data storage system, such as customer locations, so as to acquirecustomer data for storage using the data storage system. The datatransfer devices may be configured with persistent storage and otherfeatures, such as features described in further detail in theincorporated disclosures, so as to extend the capabilities of the datastorage system on an ongoing basis and, in some cases, for anindeterminate length of time.

For example, the data transfer devices may be configured to be deployedto a customer location and store one or more shards of a plurality ofshards. In some embodiments, the plurality of shards may be a bundle ofredundancy coded (bundle encoded) shards, such that some of the bundleof shards is stored on durable storage of the data storage system andsome of the bundle of the shards is resident on the data transferdevice(s). In embodiments where the bundle of shards includes identityshards (having original forms of stored data) and derived/encoded shards(having redundancy coded forms of the stored data of the bundle), manyconfigurations and allocations of shards with respect to the locationsat which the shards are stored may be contemplated. As one example, someidentity shards of a bundle may be allocated to the data transferdevice(s), while the remaining shards of the bundle (including theremaining identity shards and the encoded/derived shards of the bundle)are stored on the durable storage of the data storage system. In thisscenario, a customer or other entity may directly store customer datainto the identity shards on the data transfer device, and, in connectionwith the storage, the encoded shard on the data storage system isupdated to reflect the added customer data. The update may be performedas a result of the received data being transmitted over the network fromthe data transfer device, or after the data transfer device is shippedback to the location of the data storage system and ingested, accordingto techniques described in the incorporated disclosures. As may becontemplated, other types of redundancy coding, such as linear erasurecoding, may be used (and may, in some cases, generate allderived/encoded shards, rather than some shards having an original formof the data (e.g., identity shards)).

In circumstances where the bundle in the preceding example has a quorumquantity of shards that is equal to or less than the number of theshards in the bundle that reside within the data storage system, afterthe update of the encoded (derived) shard, the data may be directlyreadable from the identity shard on the data transfer device or, in thealternative, by reconstruction of the customer data from the quorum ofshards resident within the data storage system.

As may be contemplated, the identity shards of the bundle apportioned tothe data storage system may be used to store additional data from, e.g.,a different customer. After adding such additional data, the encodedshard need only be updated to retain the same durability characteristicsas before the additional data was added, for all of the data representedin the bundle. Furthermore, the additional data may be directly readablefrom the identity shards stored in the data storage system, and incircumstances where the number of available shards of the bundle withinthe data storage system drops below the quorum quantity, the shards ofthe bundle resident on the data transfer devices may be “borrowed” tomeet the quorum quantity and, thus, regenerate the data.

The data transfer devices may also transfer data to a second datastorage system, such as a system with a different availability level orretrieval latency, and may stand in as a proxy for the second datastorage system until data received by data transfer devices istransferred to the second data storage system. A given bundle of shardsmay thus span between two or more data storage systems, between datastorage systems and data transfer devices, or any combination thereof,and still retain the various advantages of the selected redundancycoding or other encoding selecting and implemented according to thetechniques described in further detail herein and in the incorporateddisclosures.

The data transfer devices may be configured to mimic the operation ofthe data storage system without relying on the full capabilities of thedata storage system. For example, the data transfer devices may beclustered to provide a certain level of storage, durability,computational capability, and the like, that would otherwise beavailable by provisioning a similar level of capability directly fromthe data storage system. Transfer of data and provision of capabilitiesmay be transparent as between the clustered data transfer devices andthe data storage system. In some examples, the quantity of and/orcapabilities delivered by the clustered data transfer devices may bescaled up or down on demand, such as by requisition of additional datatransfer devices to add to the cluster, or by removal of one or moredata transfer devices from the cluster. The cluster may include a localversion of interfaces, such as application programming interfaces or webservice interfaces, that are similar to those provided by the datastorage service, and thus facilitate transparent and/or flexibleconversion and/or extension between capabilities provided directly bythe cluster and those provided by the data storage system to which thecluster is associated.

Data may be stored in bundles of redundancy coded shards, which in somecases may overlap in such a fashion as to allow for various shards (and,in some embodiments, data transfer devices storing such shards) to becycled on an ongoing basis. For example, two overlapping bundles ofbundle-encoded shards may include a first bundle having two identityshards and an encoded shard, and a second bundle having two identityshards and an encoded shard, where one of the identity shards of thefirst bundle is the same identity shard as in the second bundle. A fillpattern may be implemented such that the identity shards accept data ina specified order. When a leading identity shard in the order becomes orapproaches capacity (or some other event occurs), that identity shard istransferred (e.g., via an ingestion process), to a data storage systemfor durable storage.

While the data is being transferred (e.g., physically), the other twoshards in the first bundle retain availability and durability of thedata in that first identity shard, such that if the transfer fails or ifthe first identity shard is destroyed, it may simply be recreated fromthe other two shards in the first bundle. However, if the data issuccessfully transferred, the associated derived shard, as well as thefirst identity shard, are erased, and in some embodiments added to a newbundle that include the trailing identity shard of the second bundle(and thus the “new” identity shard is added to the back end of the fillorder). While the first (leading) identity shard is unavailable, whetherbecause it is in physical transit or because it is transferring data tothe data storage system, after the triggering event, the next identityshard in the fill order becomes the leading identity shard (of thesecond bundle, in the provided example), and is used until a differentevent causes that identity shard to be transferred to the data storagesystem. The process repeats, in some cases indefinitely.

In the preceding and following description, various techniques aredescribed. For purposes of explanation, specific configurations anddetails are set forth in order to provide a thorough understanding ofpossible ways of implementing the techniques. However, it will also beapparent that the techniques described below may be practiced indifferent configurations without the specific details. Furthermore,well-known features may be omitted or simplified to avoid obscuring thetechniques being described.

FIG. 1 illustrates an example environment in which a data storage systeminterfaces with data transfer devices to store a set of redundancy codedshards, in accordance with some embodiments.

A customer entity 104, via, e.g., a customer device, transacts data 102with one or more data transfer devices 112 so as to cause the data 102to be processed using one or more redundancy codes by, e.g., the datatransfer device 112, a data storage system 116, or some combinationthereof, to generate a plurality of shards 108, 110, to be stored ondurable and/or persistent storage 106 associated with the data storagesystem 116 and/or the data transfer devices 112, according to one ormore techniques described in the incorporated disclosures. For example,the data storage system 116, the data transfer device 112, or acombination thereof, may be configured to process incoming data 102using a redundancy code, such as an erasure code, to generate theplurality of shards 108, 110. In this example, the plurality of shardsmay include identity shards 108 and encoded (derived) shards 110, which,as described in further detail herein and in the incorporateddisclosures, may be part of a bundle of bundle-encoded shards. Anexample bundle encoding may generate, for a given set of input data(such as the data 102), a greater number of shards than a quorumquantity of the shards that is sufficient to recreate the original formof some or all of the data represented in the bundle.

The generated bundle of shards (which may, in some embodiments, be apart of a grid of shards, also described in further detail herein) maybe considered, depending on the redundancy code used and the configuredquorum quantity, to have two separate copies of a given piece of datastored in an identity shard. Specifically, one copy may be stored anddirectly retrievable in original form in an identity shard, and a secondcopy may be stored as some or all of the remainder of the shards as abundle (e.g., recoverable by processing a quorum quantity of theremaining shards, using the redundancy code, so as to generate thatdata). This particular property may be used to apportion the shards ofthe bundle across multiple entities, so as to create a hybridized datastorage system that utilizes the data storage system 116 and one or moredata transfer devices 112, 114 that store disparate shards of the samebundle (e.g., the bundle is apportioned across multiple systems).

As discussed in further detail in the incorporated disclosures, the datatransfer devices 112, 114 may be configured to receive and store data102, such as on persistent storage of the data transfer devices 112,114, for later transfer of data via an ingestion process into durablestorage of a data storage system 116. The capabilities of the datatransfer devices 112, 114 may also include an inherent level ofdurability and availability of data stored thereon that lend the datatransfer devices 112, 114 to service not only as transfer devices, butalso as devices that provide data storage, archival, and computationalcapabilities similar in operation to that of the data storage system 116for a fixed or indeterminate length of time (and in some cases, suchcapabilities may be better attuned to the local environment of thecustomer entity 104 than that of the larger data storage system 116 thedevices augment).

In some embodiments, in connection with the storage of the data 102, oneor more data transfer devices 112 may be provisioned to locationsexternal to the data storage system 116, such as customer locationsassociated with the customer entity 104, so as to acquire customer data102 for storage using the data storage system 116. For example, the datatransfer devices 112, 114 may be configured to be deployed to thecustomer location and store one or more shards 108 of a bundle ofshards, such that some of the bundle of shards is stored on durablestorage of the data storage system and some of the bundle of the shardsis resident on the data transfer device(s). In embodiments where thebundle of shards includes identity shards 108 (having original forms ofstored data) and derived/encoded shards 110 (having redundancy codedforms of the stored data of the bundle), many configurations andallocations of shards with respect to the locations at which the shardsare stored, and by extension the systems and/or devices to which theyare allocated, are contemplated.

As one example, some identity shards 108 of a bundle may be allocated tothe data transfer device(s) 112, while the remaining shards of thebundle (including the remaining identity shards 108 and theencoded/derived shards 110 of the bundle) are stored on the durablestorage of the data storage system. In this scenario, a customer entitymay directly store customer data 102 into the identity shards 108allocated to the data transfer device 112, and, in connection with thestorage thereof, the encoded shard 110 on the data storage system 116 isupdated to reflect the added customer data 102. The update may beperformed as a result of the received data 102 being transmitted over anetwork, such as the Internet, from the data transfer device (e.g., inembodiments where the data transfer device is operably connected to thedata storage system 116 in connection with arriving at the customer'slocation), or after the data transfer device 112 is shipped back to thelocation of the data storage system 116 and ingested, according totechniques described in the incorporated disclosures. In the latterexample, the customer data 102 resident on the data transfer device isnot redundant or durably replicated in the bundle until such time as thedata has been ingested into the data storage system 116 and the derivedshard(s) 110 are updated.

In circumstances where the bundle in the preceding example has a quorumquantity of shards that is equal to or less than the number of theshards in the bundle allocated to the data storage system 116, after theupdate of the encoded (derived) shard 110, the data may be directlyreadable from the identity shard 108 apportioned to the data transferdevice 112 or, in the alternative, by reconstruction of the customerdata 102 from a quorum of the remainder of the shards 108, 110apportioned to the data storage system 116. As may be contemplated, theidentity shards of the bundle apportioned to the data storage system maybe used to store additional data from, e.g., a different customer, andin accordance with other bundle-encoding techniques described herein andin the incorporated disclosures. After adding such additional data, theencoded shard 110 need only be updated to retain the same durabilitycharacteristics as before the additional data was added, for all of thedata represented in the bundle. Furthermore, the additional data may bedirectly readable from the identity shards 108 stored in the datastorage system 116, and in circumstances where the number of availableshards of the bundle within the data storage system 116 drops below thequorum quantity, the shards of the bundle resident on the data transferdevices 112 may be “borrowed” to meet the quorum quantity and, thus,regenerate the data.

In some embodiments, the data transfer devices 112 may also transferdata to a second data storage system, such as a system with a differentavailability level or retrieval latency, and may stand in as a proxy forthe second data storage system until data received by data transferdevices is transferred to the second data storage system. A given bundleof shards may thus span between two or more data storage systems,between data storage systems and data transfer devices, or anycombination thereof, and still retain the various advantages of theselected redundancy coding or other encoding selecting and implementedaccording to the techniques described in further detail herein and inthe incorporated disclosures.

The bundle of shards may be extended to include additional identityshards 108, which may be apportioned to additional data transfer devices112, such as if a given customer entity desires additional storagecapacity or if the bundle is to be extended to include additionalcustomer entities (at, e.g., different locations). Such expansion may beperformed using null shard allocation and conversion and othertechniques described in further detail in the incorporated disclosure.In some embodiments, the data transfer devices 112 may be cycled (e.g.,114) as requested or in connection with information indicating that thedata transfer device 112 currently in place at a customer location ismalfunctioning, full, etc. As may be contemplated, a given portion ofcustomer data 102 transferred to a data transfer device can be treatedas durably stored once the encoded shards 110 are updated to reflect thecustomer data 102. Accordingly, a different data storage device 112 maybe sent to the customer location, and the existing data may beregenerated from the shards resident on the data storage system 116, andthe resultant identity shard may be placed on the replacement datastorage device 112.

FIG. 2 illustrates an example environment in which a plurality of datatransfer devices may be implemented to provide scalable data services,in accordance with some embodiments.

A plurality of data transfer devices, configured in a cluster 214, maybe configured to mimic the operation of the data storage system 212without relying on the full capabilities of the data storage system 212.For example, the cluster of data transfer devices 214 may be configuredto provide a certain level of storage, durability, computationalcapability, and the like, that would otherwise be available byprovisioning a similar level of capability directly from the datastorage system 212. Transfer of data (e.g., customer data 202) andprovisioning of capabilities may be transparent as between the clustereddata transfer devices 214 and the data storage system 212.

In some examples, the quantity of and/or capabilities delivered by theclustered data transfer devices may be scaled up or down on demand, suchas by requisition of additional data transfer devices to add to thecluster, or by removal of one or more data transfer devices from thecluster. Such scaling requests may be made by the customer entity 204and directed to the data storage system 212, the cluster 214, or may beimplied based on operational parameters of either the cluster 214 or thedata storage system 212.

The cluster may include a local version of interfaces exposed to thecustomer entity 204, such as application programming interfaces (APIs)or web service interfaces, that are similar to those provided by thedata storage system 212, and thus facilitate transparent and/or flexibleconversion and/or extension between capabilities provided directly bythe cluster 214 and those provided by the data storage system 212 towhich the cluster is associated. As an example, the customer entity 204may provide data 202 for archival or storage on durable and/orpersistent storage 206 of the cluster 214, such as a bundle ofredundancy coded shards 208, 210. Depending on how and to what extentthe cluster 214 has been provisioned to store the data and/or processthe data with the redundancy code, the customer entity 204 may submitdata 202 to either the cluster itself 214 or the data storage system212, and the data may be processed, transferred, and/or stored accordingto the level of provisioning, much as a data storage system 212 withmultiple regions and/or availability zones provides a unified interfaceand transparent functionality with respect to the specific regions oravailability zones in which the data is processed is stored. In otherwords, in the given example, the cluster of data storage devices 214behaves and is treated as simply another region or portion of the largerdata storage system 212, and may be scaled up and down according torequest, demand, and the like.

The scaling of the capabilities of the cluster of data storage devices214 may depend on the specific purpose or purposes provisioned from thecluster of data storage devices 214. For example, a customer associatedwith the customer entity 204 provisions the cluster of data storagedevices 214 for a specific quantity of data storage space at a specifiedlevel of reliability and/or durability. As the customer's reliability,storage space, and/or durability requirements for the cluster changes,e.g., by a request of the customer via the customer entity 204, by aprocess of the cluster 214 itself (such as using a monitor or watch dogprocess that alerts the cluster 214, or the larger data storage system212, when the provisioned limits are being approached or if a level ofusage drops below a specified proportion of the provisioned limits),and/or by a command or other process of the data storage system 212 towhich the cluster is associated, additional data transfer devices may beadded to the cluster or unneeded capacity/capability may be removed fromthe device (e.g., by removing data transfer devices from the cluster,throttling the existing devices in the cluster, or remotely provisioningunneeded capability/capacity to other clusters, the data storage system212, or the like). In circumstances where additional capability/capacityis needed in the short term, the larger data storage system 212 may beconfigured to provide the additional capability/capacity for a period oftime, in some cases indefinitely, and/or until additional data transferdevices can be added to the cluster 214.

The cluster 214 may be configured to be addressable by an externalentity—such as through its API, and by the customer entity 2014, thedata storage system 212, or related processes, systems, or devices—suchthat any of the constituent data storage devices can serve as anexternal point of communication of the cluster 214 as a whole. Forexample, the cluster 214 may be configured as or in a similar fashion tothat of a distributed hash ring. As another example, an external (orinternal) load balancing method or system may be employed such that aunified external address identifier (e.g., an IP address or similar),can internally (or externally) be changeably directed to any of theconstituent data transfer devices of the cluster to process the incomingrequest, or its eventual reply, for further processing (e.g., using thecomputational or other capabilities of the cluster).

As may be contemplated, the cluster 214 may be configured (and in somecases optimized) to provide one or more types of capability. Suchcapabilities may include one or more of the following: reliability, datastorage capacity, physical size, computational capacity (e.g., as may beprovided by graphics processors via OpenCL or GPGPU, central processingunits, specialized ASICs or other specialized processors, networkprocessors, cryptography processors, and the like), durability,throughput (either retrieval or storage), latency (either retrieval orstorage), data bandwidth, electrical power consumption/efficiency, andthe like. The cluster 214 may be optimized for one or more of thesetypes of capabilities, but still be able to provide other types ofcapabilities for which it is not necessarily (or not primarily)optimized.

FIG. 3 illustrates an example environment in which a plurality ofbundle-encoded shards, and in some cases, data transfer devices, arecycled to improve local availability and/or durability, in accordancewith some embodiments.

An initial set of bundles of bundle-encoded shards 318, 320 isconfigured to overlap. In the illustrated example, the first bundle 318includes a first identity shard 302, a first derived shard 304, and asecond identity shard 306. The second bundle 320 comprises the secondidentity shard 306 (therefore overlapping the first bundle 318), asecond derived shard 308, and a third identity shard 310. Thebundle-encoding mechanism and processes are described in greater detailelsewhere herein, such as in FIGS. 7-10 below. While the illustratedexample provides a 3:2 bundle-encoded set of shards (e.g., a quorumquantity of two out of a bundle of three shards), other encodings,including different bundle encodings (e.g., four identity shards to eachencoded shard in a bundle, where the four identity shards are two pairsof shards) or grid encodings are contemplated hereby.

Initially, a system implementing the cycled storage pattern processesdata storage requests such that the identity shards 302, 306, 310 are ina fill pattern 324 that includes specified order, illustrated in FIG. 3as identity shards 302, 306, 310. In some embodiments, each shard 302,304, 306, 308, 310 is associated with a different data transfer device,which may be provisioned by a data storage system, such as an archivaldata storage service, to provide one or more capabilities of the datastorage system or other provisioning entity locally to a customer entitywithout necessitating a network connection between the provisioned datastorage devices and the data storage system (or other provisioningentity). As described in FIG. 2 and elsewhere herein, such provisioneddata storage devices may be referred to as a cluster. However, in otherembodiments, the shards 302, 304, 306, 308, 310, etc. may notnecessarily map one-to-one with the provisioned data transfer devices,and may instead map many-to-one or one-to-many.

In the illustrated example, when an event associated with identity shard302 (shown with cross-hatching to illustratively denote inability orrestriction from accepting further data) is detected by, e.g., a monitorassociated with the first bundle 318, associated data transfer devices,the data storage system 324, or overall cluster 316, the cluster 316 isconfigured to perform several actions to cycle the layered bundles 318,320. For example, the leading identity shard 302 that, e.g., caused theevent, is transferred to the data storage system 324, such as by aningestion process of the data storage system 324, so as to store (insome cases durably) the data of the first identity shard 302. In somecases, as described in further detail in the incorporated disclosures,the ingestion process may be initiated via a physical shipment andco-location of a data transfer device on which the identity shard 302 isstored, during which time the identity shard 302 is not available andtherefore cannot directly service data retrieval requests for datastored data thereon. In such cases, the other two shards 304, 306 of thefirst bundle can be used to regenerate, via the redundancy code, thedata retrieval requests, if the derived shard 304 has not yet beendeleted.

The next identity shard 306 in the fill order is used to service furtherwrite requests and configured to accept data, in some cases after thereceipt of the event but before the identity shard 302 is transferred tothe data storage system 324. In some cases, the identity shard 306 (andin some cases, identity shard 310) may already have some data written toit, such as if there was a lapse of time between when the identity shard304 is restricted from accepting further data and when it is transferredto the data storage system (or when an associated request to transferthe data is received). In such cases, it may be contemplated that, e.g.,both derived shards 304 and 308 are updated in accordance. (Asillustrated, the unidirectional hatching indicates an example whereidentity shard 306 is partially written to at the time when a request totransfer the data of restricted/full/unavailable identity shard 304 isreceived.)

The identity shard 302 and derived shard 304 of the bundle are deleted(or marked as unused/able to be overwritten) 328, 330, e.g., logicallyfrom the cluster 316, the layered bundle scheme, or in some cases, fromthe data transfer device on which they are stored, after the data of theidentity shard 302 is successfully stored (e.g., durably, in some cases)on the receiving data storage system 324, which may in some cases be anarchival data storage service. Once such storage is verified, inembodiments where the data transfer device is reused in the clusterand/or the layered bundle scheme, the data storage devices (now beingclear of the shards 302, 304), are reinserted into the cluster, and thelayering scheme is updated to include a new bundle 322 that includes“new” identity shard 314, “new” derived shard 312, and overlaps withbundle 320 by virtue of including the identity shard 310 of the bundle320. Additionally, the “new” identity shard 314 is inserted at thetrailing end of the fill order 324.

If a retrieval request for data in identity shard 302 is received by thecluster 316 after deletion 328, 330 of shards 302, 304, the data storagesystem 324 on which the data was stored (e.g., via the ingestionprocess) may be used to service the request. As illustrated, and aspreviously mentioned, upon occurrence/initiation/detection of the eventcausing the transfer of the data from identity shard 302 to the datastorage system 324, the cluster 316 directs data storage requests to thenext identity shard in the fill order 324, in some cases beforeperforming any other substantive transfer actions.

As each identity shard fills in succession, the associated derived shard308 is calculated in response. For example, as discussed in theincorporated disclosures and in FIGS. 7-10 below, if identity shard 310is partially populated with data and thus identity shard 314 contains nodata, the derived shard 312 may simply include a copy of the partiallypopulated identity shard 310. Meanwhile, as identity shard 314 ispopulated, the derived shard 312 may be continuously updated to includeprocessed data (e.g., by the implemented redundancy code(s)) that usesdata from both the identity shard 310 and the identity shard 314 asinputs, thereby guaranteeing consistent redundancy of both the datawithin identity shard 310 and identity shard 314, even if an eventcauses 310 to be removed from the bundle (or otherwise becomesunavailable).

Events may include hardware, software, and/or cryptographic errorspotentially affecting data stored in identity shards associated withsuch hardware, software, and/or cryptographic schemes, inability toaccept additional data (e.g., reaching or approaching capacity, in somecases approaching a predetermined/prespecified threshold or percentageof capacity of the identity shard or its associated durable storage/datatransfer device), request of a requestor (e.g., request of a customerentity associated with the cluster, request of the parent data storagesystem to, e.g., perform one or more computational actions on the datain the shard), time elapsed (e.g., since the data was stored in theidentity shard, since the data was last accessed and/or modified, andthe like), and the like. In some embodiments, the cycling of shardsand/or data transfer devices may be triggered by the occurrence of morethan just one event (and in some cases, require more than one event tooccur before taking further action).

In some embodiments, data transfer devices used to store derived shards304, 308, may be provisioned such that, after being cleared, may only bereprovisioned to store new derived shards (e.g., 312), rather thanarbitrarily reprovisioned to accept any shard (e.g., either shard 312 or314). Similarly, data transfer devices used to store identity shards302, 306, 310, may only be reprovisioned for use to store new identityshards (e.g., 314). In such embodiments, the particular “derived” or“identity” data transfer devices may be optimized for storage and/orprocessing of such shards and/or data associated therewith, as, forexample, data transfer devices used to store and/or generate derivedshards may benefit from additional computational power relative to thoseoptimized to store/process identity shards, and data transfer devicesused to store and/or generate identity shards may benefit from moredurable, available, reliable, or capacious hardware (e.g., storagemedia, storage controllers, additional redundancy coding, etc.).

FIG. 4 illustrates an example process for storing redundancy coded datain a hybridized data storage system, in accordance with someembodiments.

At step 402, a data storage system, such as that described in furtherdetail herein and in the incorporated disclosures, is configured (e.g.,via API) to store and apportion a set of redundancy coded shards suchthat some of the shards are allocated to data transfer devices that areexternal to the data storage system. The shard set may include nullshards that allow for later expansion of the bundle to, e.g., includeadditional data transfer devices or additional data storage systemswithin the same bundle.

At step 404, the bundle of shards is generated according to theconfiguration calculated in step 402, and includes both identity andderived shards. In some embodiments, the data storage system centralizesthe generation of the shards, while in other embodiments, the datatransfer devices perform some or all of the computations to generate theshards (e.g., for those to be stored on the data transfer devicesthemselves).

At step 406, the generated shards are apportioned between the datatransfer device and the data storage system such that more shards areallocated to the data storage system than to the external devices orsystems, so as to prevent the data storage system from having too fewshards to regenerate the data resident in the shards apportioned to theexternal data transfer device. After apportioning, at step 408, the datatransfer devices are provisioned, e.g., to customer locations external(separate) from that of the data storage system, such as by physicalshipment, so as to facilitate direct loading by a customer entity ofcustomer data onto the data transfer devices.

At step 410, after receiving the customer data from the provisioned datatransfer devices, e.g., over the network or via some other ingestionprocess as described in the incorporated references, the derived shards(e.g., those resident on the data storage system) are updated by thedata storage system to reflect the change in the bundle's content (e.g.,to include encoded forms of the customer data).

After steps 402-410, if data resident on the data transfer devicesbecome unavailable, e.g., while the data transfer devices are in transitor malfunctions, the customer data may be regenerated from a quorumquantity of shards remaining in the bundle, such as those apportioned tothe data storage system.

FIG. 5 illustrates an example process for scalably provisioning aplurality of data transfer devices to provide data-relatedfunctionality, in accordance with some embodiments.

At step 502, a data storage system is configured to interface with avariable quantity of data transfer devices, such as incorporate suchdata transfer devices as an extension of one or more capabilitiesprovided by the data storage system. At step 504, one or more datatransfer devices are provisioned to a customer location (which may beexternal to that of the data storage system) in accordance with aresource request, e.g., of the customer.

At step 506, the data transfer devices are configured to directlyinterface and generate, in a manner similar to the manner of the datastorage system, a plurality of bundle-encoded shards for storage on theprovisioned data transfer devices, which may form a cluster that is,e.g., durable and highly available, as well as locally accessible to agiven customer entity. The generated bundle of shards is stored acrossthe cluster of data transfer devices in distributed fashion, e.g., topreserve durability characteristics of the encoded data, at step 508,and at step 510, requests for storage of further data and retrieval ofstored and encoded data are received and serviced directly by thecluster of data transfer devices, rather than the data storage system.In some embodiments, as previously discussed, requests directed to thedata storage system may be routed and serviced, in whole or in part, tothe provisioned cluster of data transfer devices.

At step 512, in accordance with a new or updated customer resourcerequest, additional data transfer devices are provisioned to the cluster(e.g., if additional storage, transactional, or computational capabilityis requested) or existing devices within the cluster are removed (e.g.,if there is excess capacity), and in some embodiments, returned to thelocation of the data storage system. At step 514, if such provisioningor deprovisioning is accompanied by a change in shard allocation orbundle size to the cluster (e.g., on account of a change in the size ofthe cluster), the bundle of shards is updated by the data storage systemor one or more nodes of the cluster to reflect the change in membership,content, etc. Such updates may include recalculation of one or moreencoded shards within the bundle.

FIG. 6 illustrates an example process for cycling data transfer devicesprovisioned to an entity remote from a data storage system, inaccordance with some embodiments.

At step 602, an entity, such as a monitoring entity associated with animplementing cluster of data transfer devices, or a provisioning datastorage system or archival data storage service, monitors at least aleading identity shard of a layered, cycled bundling scheme to which thefirst identity shard belongs, for one or more events or conditions asdescribed in detail in at least FIG. 3 above. Such events may includethe approaching of data capacity associated with the identity shard,hardware/software/cryptographic malfunctions associated with thehardware, and the like.

At decision point 604, if the event or condition is not detected, atstep 606, data storage requests continue to be directed to and fulfilledby the leading identity shard in the fill order. Otherwise, at step 608,data storage requests are directed to and fulfilled by the next identityshard in the fill pattern, which in some embodiments is the otheridentity shard in the same bundle (and therefore the “leading” identityshard in an overlapping bundle).

At step 610, after data storage requests are directed to the nextidentity shard in the fill patter at step 608, a data transfer device isused to transfer and initiation commission, processing, and/or storageof the data of the associated leading identity shard on the data storagesystem. In some cases, step 610 is initiated via physical shipmentand/or co-location of the data transfer device on which the identityshard is stored with an ingestor of the data storage system.Co-location, in this and other contexts used in this disclosure,includes the physical presence and proximity of devices or systems witheach other, in some cases such that they are physically interconnectedso as to transfer data, such as over a local area network. In otherembodiments, such data is transmitted over a network to the data storagesystem, such as via the Internet, or other common network, while thedata transfer device remains in place and physically remote from (e.g.,not co-located with) the data storage system. In contexts within thisdisclosure where the data transfer device(s) is/are remote from the datastorage system, they may be physically remote, as just mentioned, orlogically remote, in the sense that the data transfer device(s) and thedata storage system reside on different, unconnected networks (e.g.,“air gapped”), thus rendering irrelevant the physical proximity thereof.

At decision point 612, if the data transferred at step 610 is notverified as successfully, and in some cases, durably stored on thereceiving data storage system, the other shards in the same bundleremain available at step 614 to regenerate, via a redundancy code, thedata of the identity shard in question, either to service retrievalrequests associated therewith, or to recreate the identity shard (e.g.,for replacement). In some embodiments, the transfer/storage/processingof step 610 is retried until it is successful.

However, if the data is successfully (e.g., durably, verifiably)committed to the data storage system at decision point 612, at step 616,the associated derived shard (e.g., data transfer device associatedtherewith) is deleted, and a new derived shard is placed (e.g., by thecluster) along with a new identity shard (step 618) to form a newoverlapping bundle (step 620) that includes the new derived shard, thenew identity shard, and the last identity shard of the fill patternprior to step 618. The fill pattern is also updated to reflect the newidentity shard, which is placed at the trailing end of the fill pattern.In some embodiments, the process 600 repeats indefinitely, and “old”identity shards are “recycled” into “new” identity shards (and “old”derived shards are recycled into “new” derived shards) on an ongoing,cycling basis.

FIG. 7 illustrates an example environment in which a computing resourceservice provider implements a data storage service, such as a gridstorage service, to process and store data transacted therewith, inaccordance with some embodiments.

A customer, via a customer device 702, may connect via a network 704 toone or more services 706 provided by a computing resource serviceprovider 718. In some embodiments, the computing resource serviceprovider 718 may provide a distributed, virtualized, and/or datacenterenvironment within which one or more applications, processes, services,virtual machines, and/or other such computer system entities may beexecuted. In some embodiments, the customer may be a person, or may be aprocess running on one or more remote computer systems, or may be someother computer system entity, user, or process. The customer device 702and the network 704 may be similar to that described in connection withat least FIG. 1 above.

The command or commands to connect to the computer system instance mayoriginate from an outside computer system and/or server, or mayoriginate from an entity, user, or process on a remote network location,or may originate from an entity, user, or process within the computingresource service provider, or may originate from a user of the customerdevice 702, or may originate as a result of an automatic process or mayoriginate as a result of a combination of these and/or other such originentities. In some embodiments, the command or commands to initiate theconnection to the computing resource service provider 718 may be sent tothe services 706, without the intervention of the user of the services706. The command or commands to initiate the connection to the services706 may originate from the same origin as the command or commands toconnect to the computing resource service provider 718 or may originatefrom another computer system and/or server, or may originate from adifferent entity, user, or process on the same or a different remotenetwork location, or may originate from a different entity, user, orprocess within the computing resource service provider, or may originatefrom a different user of the customer device 702, or may originate as aresult of a combination of these and/or other such same and/or differententities.

The customer device 702 may request connection to the computing resourceservice provider 718 via one or more connections and, in someembodiments, via one or more networks 704 and/or entities associatedtherewith, such as servers connected to the network, either directly orindirectly. The customer device 702 that requests access to the services706 may, as previously discussed, include any device that is capable ofconnecting with a computer system via a network, including at leastservers, laptops, mobile devices such as smartphones or tablets, othersmart devices such as smart watches, smart televisions, set-top boxes,video game consoles and other such network-enabled smart devices,distributed computer systems and components thereof, abstractedcomponents such as guest computer systems or virtual machines and/orother types of computing devices and/or components. The network 704,also as previously discussed, may include, for example, a local network,an internal network, a public network such as the Internet, or othernetworks such as those listed or described herein. The network may alsooperate in accordance with various protocols such as those listed ordescribed herein.

The computing resource service provider 718 may provide access to one ormore host machines as well as provide access to services such as virtualmachine (VM) instances, automatic scaling groups, or file-based databasestorage systems as may be operating thereon. The services 706 mayconnect to or otherwise be associated with one or more storage servicessuch as those described herein (e.g., the data storage service 714). Thestorage services may be configured to provide data storage for theservices 706. In an embodiment, the computing resource service provider718 may provide direct access to the one or more storage services foruse by users and/or customers of the computing resource serviceprovider. The storage services may manage storage of data on one or moreblock storage devices and/or may manage storage of data on one or morearchival storage devices such as, for example, magnetic tapes.

For example, the computing resource service provider 718 may provide avariety of services 706 to the customer device 702, which may in turncommunicate with the computing resource service provider 718 via aninterface, which may be a web service interface, application programminginterface (API), user interface, or any other type of interface. Theservices 706 provided by the computing resource service provider 718 mayinclude, but may not be limited to, a virtual computer system service, ablock-level data storage service, a cryptography service, an on-demanddata storage service, a notification service, an authentication service,a policy management service, an archival storage service, a durable datastorage service such as the data storage service 714, and/or other suchservices. Each of the services 706 provided by the computing resourceservice provider 718 may include one or more web service interfaces thatenable the customer device 702 to submit appropriately configured APIcalls to the various services through web service requests. In addition,each of the services may include one or more service interfaces thatenable the services to access each other (e.g., to enable a virtualcomputer system of the virtual computer system service to store data inor retrieve data from the on-demand data storage service or the datastorage service 714, and/or to access one or more block-level datastorage devices provided by the block-level data storage service).

The block-level data storage service may comprise one or more computingresources that collectively operate to store data for a user usingblock-level storage devices (and/or virtualizations thereof). Theblock-level storage devices of the block-level data storage service may,for example, be operationally attached to virtual computer systemsprovided by a virtual computer system service to serve as logical units(e.g., virtual drives) for the computer systems. A block-level storagedevice may enable the persistent storage of data used or generated by acorresponding virtual computer system where the virtual computer systemservice may be configured to only provide ephemeral data storage.

The computing resource service provider 718 may also include anon-demand data storage service. The on-demand data storage service maybe a collection of computing resources configured to synchronouslyprocess requests to store and/or access data. The on-demand data storageservice may operate using computing resources (e.g., databases) thatenable the on-demand data storage service to locate and retrieve dataquickly, to allow data to be provided in response to requests for thedata. For example, the on-demand data storage service may maintainstored data in a manner such that, when a request for a data object isretrieved, the data object can be provided (or streaming of the dataobject can be initiated) in a response to the request. As noted, datastored in the on-demand data storage service may be organized into dataobjects. The data objects may have arbitrary sizes except, perhaps, forcertain constraints on size. Thus, the on-demand data storage servicemay store numerous data objects of varying sizes. The on-demand datastorage service may operate as a key value store that associates dataobjects with identifiers of the data objects that may be used by theuser to retrieve or perform other operations in connection with the dataobjects stored by the on-demand data storage service.

Note that, unless otherwise specified, use of expressions regardingexecutable instructions (also referred to as code, applications, agents,etc.) performing operations that instructions do not ordinarily performunaided (e.g., transmission of data, calculations, etc.) in the contextof describing disclosed embodiments denote that the instructions arebeing executed by a machine, thereby causing the machine to perform thespecified operations.

The services 706 may produce data, such as data 708 received from thecustomer device 702, which may be stored 722 in the preliminary storage712 as described above. In some embodiments, as previously mentioned,the data stored in the preliminary storage may be stored in unalteredform, such as in an identity shard. While the data is stored in thepreliminary storage 712, the data 722 may be accessed by the services706 (e.g., as a result of one or more API requests by the customerdevice 702) from the preliminary storage 712. After a determined period270, such as described above in connection with FIG. 1, has passed andthe data is migrated to a data storage service 714 provided by thecomputing resource service provider 718, the data may be accessed usingthe data storage service 714. In an embodiment where the data may bestored using redundancy encoding technique such as those describedherein, the data storage service 714 may retrieve the data from any ofthe data volumes 716 and/or may reconstruct the data using theredundancy encoding techniques. The data volumes 716 may be magnetictape, may be optical disks, or may be some other such storage media. Aspreviously discussed and as further discussed herein, the data may bestored in identity shards that correspond individually to volumes, andmay also be processed (using the redundancy encoding techniques) so asto create derived shards.

The data storage service 714 may store the data 722 in the preliminarystorage 712 or may transmit a command that causes a different service(e.g., a block storage service or some other storage service such asthose described herein) to store the data 722 in the preliminary storage712. The data storage service 714 may also cause the data to be migratedfrom the preliminary storage 712 or may transmit a command that causes adifferent service to cause the data to be migrated from the preliminarystorage 712. The data storage service 714 may also transmit a command orcommands to cause a different service to perform other operationsassociated with making data objects eventually durable including, butnot limited to, storing the data objects in the data shards, calculatingderived shards, updating bundles, updating grids (i.e., updatinghorizontal, vertical, and other bundles of multiply bundled data),and/or other such operations.

In an embodiment, the preliminary storage 712 is a data storage volumesuch as, for example, a magnetic disk drive (e.g., a spinning disk driveor a solid state disk drive), computer system memory, magnetic tape, orsome other optical storage device. In another embodiment, thepreliminary storage 712 is a virtual and/or shared data storage volumethat is mapped to a physical storage volume such as, for example, a diskdrive, a solid state disk drive, computer system memory, magnetic tape,or some other optical storage device. As may be contemplated, the typesof data storage volumes used for the preliminary storage 712 describedherein are illustrative examples and other types of data storage volumesused for the preliminary storage 106 may be considered as within thescope of the present disclosure.

In an embodiment, the preliminary storage 712 is a plurality of storagedevices that are used to redundantly store the data using techniquessuch as, for example, bundle encoding, grid encoding, or replicatedstorage. For example, the preliminary storage 712 may store the data bydistributing the data to a plurality of data shards (e.g., putting afirst portion of the data in a first data shard and a second portion ofthe data in a second data shard) and generating one or more derivedshards based on those data shards. In another embodiment, thepreliminary storage 112 is one or more storage devices that storeredundant copies of the data as received. In yet another embodiment, thepreliminary storage uses a combination of the storage techniquesdescribed herein by, for example, storing a single copy of the data fora first time period (e.g., thirty minutes), storing multiple copies ofthe data for a second time period (e.g., one day), using redundantstorage techniques such as grid or bundle encoding to store the data fora third time period (e.g., thirty days), and then moving the data tomore durable storage 716 using the data storage service 714 as describedherein.

The set of data may be stored in the preliminary storage 712 in anunaltered form (e.g., not processed, compressed, indexed, or alteredprior to storage). The set of data may also be stored in the preliminarystorage 712 as, for example, original data (also referred to herein asan “identity shard”) such as the original data shards described herein.In an embodiment, the set of data stored in the preliminary storage 712is stored without indexing and without any redundancy encoding. Inanother embodiment, the set of data stored in the preliminary storage712 is stored with null redundancy encoding (i.e., a redundancy encodingthat maps the data to itself). The data in preliminary storage may bestored as raw data, or may be bundle-encoded, or may be grid-encoded, ormay be stored using some other method.

In an embodiment, data can be migrated from preliminary storage to thedata storage service 712 as a result of an event such as, for example, arequest by a customer to store the data in the data storage service 714.Other events may also be used to cause the migration of the data frompreliminary storage 712 to the data storage service 714 such as, forexample, events generated by a process, module, service, or applicationassociated with the customer or associated with a computing resourceservice provider. In an illustrative example, a block storage servicemay maintain data storage in preliminary storage for a running virtualmachine instance and, upon termination of the instance, may generate anevent to migrate some or all of the data from preliminary storage todurable storage. The triggering event that causes the migration of datafrom preliminary storage may also be combined with an elapsed time asdescribed above so that, for example, data may be stored in preliminarystorage until an event occurs, but the data may also be migrated frompreliminary storage if no event occurs prior to the elapsed time. As maybe contemplated, the criteria for initiating the migration frompreliminary storage described herein are illustrative examples and othersuch criteria for initiating the migration from preliminary storage maybe considered as within the scope of the present disclosure.

As used herein, the durability of a data object may be understood to bean estimate of the probability that the data object will notunintentionally become permanently irretrievable (also referred toherein as “unavailable”). This durability is an estimated probabilityand is generally expressed as a percentage (e.g., 99.9999 percent). Thisdurability is based on assumptions of probabilities of certain failures(e.g., the AFR of drives used to store the data) and may be based on anaverage failure rate, a maximum failure rate, a minimum failure rate, amean failure rate, or some other such failure rate. The durability maybe based on a statistical average of the failure over a collection ofdrives when there are many different drives and/or when there are manydifferent types of drives. The durability may also be based onhistorical measurements of the failure of drives and/or statisticalsampling of the historical measurements of the failure of drives. Thedurability may also be correlated with the probability that a dataobject will not unintentionally become unavailable such as, for example,basing the durability on the probability that a data object willunintentionally become unavailable. As may be contemplated, the methodsof determining durability of data described herein are merelyillustrative examples and other such methods of determining durabilityof data may be considered as within the scope of the present disclosure.

In an embodiment, a separate service 710 can be configured to monitorthe elapsed time 720 associated with the data objects in preliminarystorage 712 and, based on a desired durability, cause the data storageservice 714 to cause the data to be migrated from the preliminarystorage 712 to the durable storage by, for example, transmitting amessage to the data storage service. This separate service may operateasynchronously to enforce time limits for all such data stored inpreliminary storage.

FIG. 8 illustrates an example environment 800 where a redundancyencoding technique is applied to data stored in durable storage asdescribed in connection with FIG. 5 and in accordance with anembodiment. The redundancy encoding technique illustrated in FIG. 8 isan example of a grid encoding technique wherein each identity shard ispart of a first set of one or more identity shards which may be bundledwith one or more derived shards in a first group or bundle (i.e., in onedimension or direction) and each identity shard is also part of at leasta second set of one or more identity shards which may be bundled withone or more other derived shards in a second bundle or group (i.e., in asecond dimension or direction). As is illustrated in FIG. 8, a gridencoding technique is often implemented as a two-dimensional grid, witheach shard being part of two bundles (i.e., both “horizontal” and“vertical” bundles). However, a grid encoding technique may also beimplemented as a three-dimensional grid, with each shard being part ofthree bundles, or a four-dimensional grid, with each shard being part offour bundles, or as a larger-dimensional grid. Additional details ofgrid encoding techniques are described in U.S. patent application Ser.No. 14/789,783, filed Jul. 1, 2015, entitled “GRID ENCODED DATA STORAGESYSTEMS FOR EFFICIENT DATA REPAIR,” which is incorporated by referenceherein.

In the example illustrated in FIG. 8, data 802 from preliminary storageis provided for storage in durable storage using a redundancy encodingtechnique with both horizontal derived shards and vertical derivedshards. In the example illustrated in FIG. 8, a first datacenter 812 maycontain data shards (denoted as a square shard with the letter “I”),horizontal derived shards (denoted as a triangular shard with the Greekletter “δ” or delta), and vertical derived shards (denoted as aninverted triangle with the Greek letter “δ”) all of which may be storedon durable storage volumes within the first datacenter 812. A seconddatacenter 814, which may be geographically and/or logically separatefrom the first datacenter 812, may also contain data shards, horizontalderived shards, and/or vertical derived shards. A third datacenter 816,which may be geographically and/or logically separate from the firstdatacenter 812 and from the second datacenter 814, may also contain datashards, horizontal derived shards, and/or vertical derived shards. Asillustrated in FIG. 8, each of the three datacenters may be a singlevertical bundle. In an embodiment, each of the datacenters can includemultiple vertical bundles. As may be contemplated, the number ofdatacenters illustrated in FIG. 8 and/or the composition of thedatacenters illustrated in FIG. 8 are merely illustrative examples andother numbers and/or compositions of datacenters may be considered aswithin the scope of the present disclosure. The datacenters may beco-located or may be located in one or more separate datacenterlocations.

In the example illustrated in FIG. 8, the data 802 may be copied to adata shard 804 and, as a result of the change to the data in the datashard 804, a horizontal derived shard 806 associated with the data shard804 may be updated so that the horizontal derived shard 806 may be usedto reconstruct the data shard 804 in the event of a loss of the datashard 804. In the example illustrated in FIG. 8, the three shardsenclosed by the dotted line (e.g., the data shard 804, the data shard820, and the horizontal derived shard 806) are a horizontal bundle 818.In this example, the data shard 820 is not affected by the changes tothe data shard 804 but the horizontal derived shard 806 may need to beupdated as a result of the changes to the data shard 804.

Also as a result of the change to the data in the data shard 804, one ormore vertical derived shards 808 related to the data shard 804 may alsobe updated so that the vertical derived shards 808 may be used toreconstruct the data shard 804 in the event of a loss of the data shard804 and the horizontal derived shard 806. In the example illustrated inFIG. 8, the shards in datacenter 812 form a vertical bundle. In thisexample, the other data shards 822 in the vertical bundle and/or thehorizontal derived shards 824 in the vertical bundle are not affected bythe changes to the data shard 804 but the vertical derived shards 808may need to be updated as a result of the changes to the data shard 804.Finally, as a result of the change to the horizontal derived shard 806,one or more vertical derived shards 810 related to the horizontalderived shard 806 in the vertical bundle in datacenter 816 may also beupdated so that the vertical derived shards 810 may be used toreconstruct the horizontal derived shard 806 in the event of a loss ofthe horizontal derived shard 806 and the data shard 804.

FIG. 9 illustrates an example environment 900 where a redundancyencoding technique is applied to data stored in durable storage asdescribed herein and in accordance with at least one embodiment. Theredundancy encoding technique illustrated in FIG. 9 is an example of abundle encoding technique wherein one or more identity shards (alsoreferred to herein as “data shards”) may be bundled with one or morederived shards in a single group or dimension. Additional details ofbundle encoding techniques are described in U.S. patent application Ser.No. 14/741,409, filed Jun. 16, 2015, entitled “ADAPTIVE DATA LOSSMITIGATION FOR REDUNDANCY CODING SYSTEMS,” which is incorporated byreference herein.

Data 902 from preliminary storage may be sent to a data storage system904 for redundant storage. The data 902 may be provided from thepreliminary storage by any entity capable of transacting data with adata storage system, such as over a network (including the Internet).Examples include physical computing systems (e.g., servers, desktopcomputers, laptop computers, thin clients, and handheld devices such assmartphones and tablets), virtual computing systems (e.g., as may beprovided by the computing resource service provider using one or moreresources associated therewith), services (e.g., such as thoseconnecting to the data storage system 904 via application programminginterface calls, web service calls, or other programmatic methods), andthe like.

The data storage system 904 may be any computing resource or collectionof such resources capable of processing data for storage, andinterfacing with one or more resources to cause the storage of theprocessed data. Examples include physical computing systems (e.g.,servers, desktop computers, laptop computers, thin clients, and handhelddevices such as smartphones and tablets), virtual computing systems(e.g., as may be provided by the computing resource service providerusing one or more resources associated therewith), services (e.g., suchas those connecting to the data storage system 904 via applicationprogramming interface calls, web service calls, or other programmaticmethods), and the like. In some embodiments, the resources of the datastorage system 904, as well as the data storage system 904 itself, maybe one or more resources of a computing resource service provider, suchas that described in further detail below. In some embodiments, the datastorage system 904 and/or the computing resource service providerprovides one or more archival storage services and/or data storageservices, such as those described herein, through which a client entitymay provide data such as the data 902 for storage in preliminary storageand/or the data storage system 904.

Data 902 may include any quantity of data in any format. For example,the data 902 may be a single file or may include several files. The data902 may also be encrypted by, for example, a component of the datastorage system 904 after the receipt of the data 902 in response to arequest made by a customer of the data storage system 904 and/or by acustomer of computing resource service provider.

The data storage system 904 may sort one or more identity shardsaccording to one or more criteria (and in the case where a plurality ofcriteria is used for the sort, such criteria may be sorted againstsequentially and in any order appropriate for the implementation). Suchcriteria may be attributes common to some or all of the archives, andmay include the identity of the customer, the time of upload and/orreceipt (by the data storage system 904), archive size, expected volumeand/or shard boundaries relative to the boundaries of the archives(e.g., so as to minimize the number of archives breaking across shardsand/or volumes), and the like. As mentioned, such sorting may beperformed so as to minimize the number of volumes on which any givenarchive is stored. Such techniques may be used, for example, to optimizestorage in embodiments where the overhead of retrieving data frommultiple volumes is greater than the benefit of parallelizing theretrieval from the multiple volumes. Information regarding the sortorder may be persisted, for example, by the data storage system 904, foruse in techniques described in further detail herein.

As previously discussed, in some embodiments, one or more indices may begenerated in connection with, for example, the order in which thearchives are to be stored, as determined in connection with the sortingmentioned immediately above. The index may be a single index or may be amultipart index, and may be of any appropriate architecture and may begenerated according to any appropriate method. For example, the indexmay be a bitmap index, dense index, sparse index, or a reverse index.Embodiments where multiple indices are used may implement differenttypes of indices according to the properties of the identity shard to bestored via the data storage system 904. For example, a data storagesystem 904 may generate a dense index for archives over a specified size(as the size of the index itself may be small relative to the number ofarchives stored on a given volume), and may also generate a sparse indexfor archives under that specified size (as the ratio of index size toarchive size increases).

The data storage system 904 is connected to or includes one or morevolumes 906 on which archives or identity shards may be stored. Thegenerated indices for the archives may also be stored on the one or morevolumes 906. The volumes 906 may be any container, whether logical orphysical, capable of storing or addressing data stored therein. In someembodiments, the volumes 906 may map on a one-to-one basis with the datastorage devices on which they reside (and, in some embodiments, mayactually be the data storage devices themselves). In some embodiments,the size and/or quantity of the volumes 906 may be independent of thecapacity of the data storage devices on which they reside (e.g., a setof volumes may each be of a fixed size such that a second set of volumesmay reside on the same data storage devices as the first set). The datastorage devices may include any resource or collection of resources,such as those of a computing resource service provider, that are capableof storing data, and may be physical, virtual, or some combination ofthe two.

As previously described, one or more indices may, in some embodiments,be generated for each volume of the plurality of volumes 906, and insuch embodiments, may reflect the archives stored on the respectivevolume to which it applies. In embodiments where sparse indices areused, a sparse index for a given volume may point to a subset ofarchives stored or to be stored on that volume, such as those archiveswhich may be determined to be stored on the volume based on the sorttechniques mentioned previously. The subset of volumes to be indexed inthe sparse index may be selected on any appropriate basis and for anyappropriate interval. For example, the sparse index may identify thearchives to be located at every x blocks or bytes of the volume (e.g.,independently of the boundaries and/or quantity of the archivesthemselves). As another example, the sparse index may identify every ntharchive to be stored on the volume. As may be contemplated, the indices(whether sparse or otherwise), may be determined prior to actuallystoring the archives on the respective volumes. In some embodiments, aspace may be reserved on the volumes so as to generate and/or write theappropriate indices after the archives have been written to the volumes906.

In some embodiments, the sparse indices are used in connection withinformation relating to the sort order of the archives so as to locatearchives without necessitating the use of dense indices, for example,those that account for every archive on a given volume. Such sortorder-related information may reside on the volumes 906 or, in someembodiments, on an entity separate from the volumes 906, such as in adata store or other resource of a computing resource service provider.Similarly, the indices may be stored on the same volumes 906 to whichthey apply, or, in some embodiments, separately from such volumes 906.

The archives may be stored, bit for bit (e.g., the “original data” ofthe archives), on a subset of the plurality of volumes 906. Also asmentioned, appropriate indices may also be stored on the applicablesubset of the plurality of volumes 906. The original data of thearchives is stored as a plurality of shards across a plurality ofvolumes, the quantity of which (either shards or volumes, which in somecases may have a one to one relationship) may be predetermined accordingto various factors, including the number of total shards that may beused to reconstruct the original data using a redundancy encode. In someembodiments, the number of volumes used to store the original data ofthe archives is the quantity of shards that may be used to reconstructthe original data from a plurality of shards generated by a redundancycode from the original data. As an example, FIG. 9 illustrates fivevolumes, three of which contain original data archives 908 and two ofwhich contain derived data 910, such as redundancy encoded data. In theillustrated example, the redundancy code used may require any threeshards to regenerate original data, and therefore, a quantity of threevolumes may be used to write the original data (even prior to anyapplication of the redundancy code).

The volumes 906 bearing the original data archives 908 may each containor be considered as shards unto themselves. For example, the data 902from preliminary storage may be copied directly only to a volume if, asdescribed herein, it is stored in preliminary storage as an identityshard. In embodiments where the sort order-related information and/orthe indices are stored on the applicable volumes 906, they may beincluded with the original data of the archives and stored therewith asshards, as previously mentioned. In the illustrated example, theoriginal data archives 908 are stored as three shards (which may includethe respective indices) on three associated volumes 906. In someembodiments, the original data archives 908 (and, in embodiments wherethe indices are stored on the volumes, the indices) are processed by anentity associated with, for example, the archival storage service, usinga redundancy code, such as an erasure code, so as to generate theremaining shards, which contain encoded information rather than theoriginal data of the original data archives. The original data archives908 may be processed using the redundancy code at any time after beingsorted, such as prior to being stored on the volumes, contemporaneouslywith such storage, or after such storage.

Such encoded information may be any mathematically computed informationderived from the original data, and depends on the specific redundancycode applied. As mentioned, the redundancy code may include erasurecodes (such as online codes, Luby transform codes, raptor codes, paritycodes, Reed-Solomon codes, Cauchy codes, Erasure Resilient SystematicCodes, regenerating codes, or maximum distance separable codes) or otherforward error correction codes. In some embodiments, the redundancy codemay implement a generator matrix that implements mathematical functionsto generate multiple encoded objects correlated with the original datato which the redundancy code is applied. In some of such embodiments, anidentity matrix is used, wherein no mathematical functions are appliedand the original data (and, if applicable, the indices) are allowed topass straight through. In such embodiments, it may be thereforecontemplated that the volumes bearing the original data (and theindices) may correspond to objects encoded from that original data bythe identity matrix rows of the generator matrix of the appliedredundancy code, while volumes bearing derived data correspond to otherrows of the generator matrix. In the example illustrated in FIG. 9, thefive volumes 906 include three volumes that have shards (e.g., identityshards) corresponding to the original data of the original data archives908, while two have encoded shards corresponding to the derived data 910(also referred to herein as “derived shards”). As illustrated in FIG. 9,the three original data archives 908, and the two encoded shardscorresponding to the derived data 910 form a bundle 918 (denoted by thedashed line). In this example, the applied redundancy code may result inthe data being stored in a “3:5” scheme, wherein any three shards of thefive stored shards are required to regenerate the original data,regardless of whether the selected three shards contain the originaldata or the derived data.

In some embodiments, if one of the volumes 906 or a shard stored thereonis detected as corrupt, missing, or otherwise unavailable, a new shardmay be generated using the redundancy code applied to generate theshard(s) in the first instance. The new shard may be stored on the samevolume or a different volume, depending, for example, on whether theshard is unavailable for a reason other than the failure of the volume.The new shard may be generated by, for example, the data storage system904, by using a quantity of the remaining shards that may be used toregenerate the original data (and the index, if applicable) storedacross all volumes, regenerating that original data, and eitherreplacing the portion of the original data corresponding to that whichwas unavailable (in the case that the unavailable shard containsoriginal data), or reapplying the redundancy code so as to providederived data for the new shard.

As previously discussed, in some embodiments, the new shard may be areplication of the unavailable shard, such as may be the case if theunavailable shard includes original data of the archive(s). In someembodiments, the new shard may be selected from a set of potentialshards as generated by, for example, a generator matrix associated withthe redundancy code, so as to differ in content from the unavailableshard (such as may be the case if the unavailable shard was a shardgenerated from the redundancy code, and therefore contains no originaldata of the archives). As discussed throughout this disclosure, theshards and/or volumes may be grouped and/or layered.

In some embodiments, retrieval of an archive stored in accordance withthe techniques described herein may be requested by a client entityunder control of a customer of the computing resource service providerand/or the archival storage service provided therefrom, as described infurther detail throughout this disclosure. In response to the request,the data storage system 904 may locate, based on information regardingthe sort order of the archives as stored on the volumes 906, thespecific volume on which the archive is located. Thereafter, the indexor indices may be used to locate the specific archive, whereupon it maybe read from the volume and provided to a requesting client entity. Inembodiments where sparse indices are employed, the sort orderinformation may be used to locate the nearest location (or archive) thatis sequentially prior to the requested archive, whereupon the volume issequentially read from that location or archive until the requestedarchive is found. In embodiments where multiple types of indices areemployed, the data storage system 904 may initially determine which ofthe indices includes the most efficient location information for therequested archive based on assessing the criteria used to deploy themultiple types of indices in the first instance. For example, ifarchives under a specific size are indexed in a sparse index andarchives equal to or over that size are indexed in a parallel denseindex, the data storage system 904 may first determine the size of therequested archive, and if the requested archive is larger than or equalto the aforementioned size boundary, the dense index may be used so asto more quickly obtain the precise location of the requested archive.

In some embodiments, the volumes 906 may be grouped such that each givenvolume has one or more cohorts 916. In such embodiments, a volume set(e.g., all of the illustrated volumes 906) may be implemented such thatincoming archives to be stored on the volumes are apportioned to one ormore failure-decorrelated subsets of the volume set. Thefailure-decorrelated subsets may be some combination of the volumes 906of the volume subset, where the quantity of volumes correlates to anumber of shards required for the implemented redundancy code. In theillustrated example, the overall volume set may comprise twofailure-decorrelated subsets (volumes in a horizontal row) where a givenconstituent volume is paired with a cohort (e.g., the cohort 916). Insome embodiments, the incoming archives are apportioned to one or moreof the cohorts in the failure-decorrelated subset according to, forexample, a predetermined sequence, based on one or more attributes ofthe incoming archives, and the like.

The illustrated example shows, for clarity, a pair-wise cohort scheme,though other schemes are contemplated as within scope of thisdisclosure, some of which are outlined in greater detail herein. In theillustrated example, some of the volumes of the volume set storeoriginal data of incoming archives (e.g., original data archives 908and/or original data archives 912), while others store derived data(e.g., derived data 910 and derived data 914). The data storage system904 may implement a number of failure-decorrelated subsets to which tostore the incoming archives, and in the pair-wise scheme pictured, thevolumes used for a given archive may differ based on some arbitrary orpredetermined pattern. As illustrated, some archives may be apportionedto volumes of a given cohort that are assigned to one pattern, orfailure-decorrelated subset as shown by original data archives 908 andderived data 910, while others are apportioned to volumes in a differentpattern as shown by original data archives 912 and derived data 914. Thepatterns, as mentioned, may be arbitrary, predefined, and/or in somecases, sensitive to attributes of the incoming data. In someembodiments, patterns may not be used at all, and the member volumes ofa given failure-decorrelated subset may be selected randomly from a poolof volumes in the volume set.

FIG. 10 illustrates an example process 1000 for applying redundancyencoding techniques to data stored in durable storage as describedherein in connection with FIG. 7 and in accordance with at least oneembodiment. The example process 1000 illustrated in FIG. 10 illustratesthe processing, indexing, storing, and retrieving of data stored on adata storage system. The data may be retrieved from preliminary storageas described herein. The example process 1000 illustrated in FIG. 10 maybe used in conjunction with a grid encoding technique such thatdescribed in connection with FIG. 8, in conjunction with a bundleencoding technique such as that described in connection with FIG. 9, orwith some other redundancy encoding technique. A data storage servicesuch as the data storage service described herein may perform theexample process 1000 illustrated in FIG. 10.

At step 1002, a resource of a data storage system, such as thatimplementing a redundancy code to store archives, determines whichsubset (e.g., quantity) of a plurality of volumes that may be used torecreate the original data to be stored, based on, for example, aredundancy code to be applied to the archives. For example, inaccordance with the techniques described above in connection with FIG.9, such information may be derived from predetermining the parameters ofan erasure code with a specified ratio of shards that may be used toregenerate the original data from which they derive to the total numberof shards generated from the application of the erasure code.

At step 1004, original data, such as original data of archives receivedfrom customers of, for example, a data storage system or a computingresource service provider as described in further detail herein, issorted by, for example, the data storage system or associated entity.For example, the sort order may be implemented on one or more attributesof the incoming data.

At step 1006, one or more indices, such as sparse indices, are generatedby, for example, the data storage system, for the original data. Forexample, there may be more than one index for a given volume, and suchparallel indices may be of different types depending on the nature ofthe archives and/or original data being stored.

At step 1008, the original data is stored, for example, by the datastorage system, on the subset of volumes determined in connection withstep 1002, and in the order determined in step 1004. Additionally, atstep 1010, the index generated in step 1006 is stored, for example, bythe data storage system, on an appropriate entity. For example, theindex may be stored as part of a shard on which the original data isstored, or, in some embodiments, may be stored on a separate resourcefrom that which persists the volume.

At step 1012, the redundancy code is applied, for example, by the datastorage system, to the determined subset of volumes (e.g., shards, aspreviously described herein), and additional shards containing dataderived from the application of the redundancy code are stored on apredetermined quantity of volumes outside the subset determined inconnection with step 1002. For example, the ratio of volumes (e.g.,shards as previously described herein) storing the original data to theoverall quantity of volumes (including those storing the derived datagenerated in this step 1012) may be prescribed by the recovery/encodingratio of the redundancy code applied herein.

At step 1014, in normal operation, requested data may be retrieved, forexample, by the data storage system, directly from the subset of volumesstoring the original data, without necessitating retrieval and furtherprocessing (e.g., by the redundancy code) from the volumes storing thederived data generated in step 1012. However, at step 1016, if any ofthe volumes are determined, for example, by the data storage system, tobe unavailable, a replacement shard may be generated by the data storagesystem by reconstructing the original data from a quorum of theremaining shards, and re-encoding using the redundancy code to generatethe replacement shard. The replacement shard may be the same or may bedifferent from the shard detected as unavailable.

FIG. 11 illustrates aspects of an example environment 1100 forimplementing aspects in accordance with various embodiments. As will beappreciated, although a web-based environment is used for purposes ofexplanation, different environments may be used, as appropriate, toimplement various embodiments. The environment includes an electronicclient device 1102, which can include any appropriate device operable tosend and/or receive requests, messages, or information over anappropriate network 1104 and, in some embodiments, convey informationback to a user of the device. Examples of such client devices includepersonal computers, cell phones, handheld messaging devices, laptopcomputers, tablet computers, set-top boxes, personal data assistants,embedded computer systems, electronic book readers, and the like. Thenetwork can include any appropriate network, including an intranet, theInternet, a cellular network, a local area network, a satellite network,or any other such network and/or combination thereof. Components usedfor such a system can depend at least in part upon the type of networkand/or environment selected. Many protocols and components forcommunicating via such a network are well known and will not bediscussed herein in detail. Communication over the network can beenabled by wired or wireless connections and combinations thereof. Inthis example, the network includes the Internet and/or otherpublicly-addressable communications network, as the environment includesa web server 1106 for receiving requests and serving content in responsethereto, although for other networks an alternative device serving asimilar purpose could be used as would be apparent to one of ordinaryskill in the art.

The illustrative environment includes at least one application server1108 and a data store 1110. It should be understood that there can beseveral application servers, layers or other elements, processes orcomponents, which may be chained or otherwise configured, which caninteract to perform tasks such as obtaining data from an appropriatedata store. Servers, as used herein, may be implemented in various ways,such as hardware devices or virtual computer systems. In some contexts,servers may refer to a programming module being executed on a computersystem. As used herein, unless otherwise stated or clear from context,the term “data store” refers to any device or combination of devicescapable of storing, accessing, and retrieving data, which may includeany combination and number of data servers, databases, data storagedevices, and data storage media, in any standard, distributed, virtual,or clustered environment. The application server can include anyappropriate hardware, software, and firmware for integrating with thedata store as needed to execute aspects of one or more applications forthe client device, handling some or all of the data access and businesslogic for an application. The application server may provide accesscontrol services in cooperation with the data store and is able togenerate content including, but not limited to, text, graphics, audio,video, and/or other content usable to be provided to the user, which maybe served to the user by the web server in the form of HyperText MarkupLanguage (“HTML”), Extensible Markup Language (“XML”), JavaScript,Cascading Style Sheets (“CSS”), JavaScript Object Notation (JSON),and/or another appropriate client-side structured language. Contenttransferred to a client device may be processed by the client device toprovide the content in one or more forms including, but not limited to,forms that are perceptible to the user audibly, visually, and/or throughother senses. The handling of all requests and responses, as well as thedelivery of content between the client device 1102 and the applicationserver 1108, can be handled by the web server using PHP: HypertextPreprocessor (“PHP”), Python, Ruby, Perl, Java, HTML, XML, JSON, and/oranother appropriate server-side structured language in this example.Further, operations described herein as being performed by a singledevice may, unless otherwise clear from context, be performedcollectively by multiple devices, which may form a distributed and/orvirtual system.

The data store 1110 can include several separate data tables, databases,data documents, dynamic data storage schemes and/or other data storagemechanisms and media for storing data relating to a particular aspect ofthe present disclosure. For example, the data store illustrated mayinclude mechanisms for storing production data 1112 and user information1116, which can be used to serve content for the production side. Thedata store also is shown to include a mechanism for storing log data1114, which can be used for reporting, analysis, or other such purposes.It should be understood that there can be many other aspects that mayneed to be stored in the data store, such as page image information andaccess rights information, which can be stored in any of the abovelisted mechanisms as appropriate or in additional mechanisms in the datastore 1110. The data store 1110 is operable, through logic associatedtherewith, to receive instructions from the application server 1108 andobtain, update or otherwise process data in response thereto. Theapplication server 1108 may provide static, dynamic, or a combination ofstatic and dynamic data in response to the received instructions.Dynamic data, such as data used in web logs (blogs), shoppingapplications, news services and other such applications may be generatedby server-side structured languages as described herein or may beprovided by a content management system (“CMS”) operating on, or underthe control of, the application server. In one example, a user, througha device operated by the user, might submit a search request for acertain type of item. In this case, the data store might access the userinformation to verify the identity of the user and can access thecatalog detail information to obtain information about items of thattype. The information then can be returned to the user, such as in aresults listing on a web page that the user is able to view via abrowser on the user device 1102. Information for a particular item ofinterest can be viewed in a dedicated page or window of the browser. Itshould be noted, however, that embodiments of the present disclosure arenot necessarily limited to the context of web pages, but may be moregenerally applicable to processing requests in general, where therequests are not necessarily requests for content.

Each server typically will include an operating system that providesexecutable program instructions for the general administration andoperation of that server and typically will include a computer-readablestorage medium (e.g., a hard disk, random access memory, read onlymemory, etc.) storing instructions that, when executed (i.e., as aresult of being executed) by a processor of the server, allow the serverto perform its intended functions.

The environment, in one embodiment, is a distributed and/or virtualcomputing environment utilizing several computer systems and componentsthat are interconnected via communication links, using one or morecomputer networks or direct connections. However, it will be appreciatedby those of ordinary skill in the art that such a system could operateequally well in a system having fewer or a greater number of componentsthan are illustrated in FIG. 11. Thus, the depiction of the system 1100in FIG. 11 should be taken as being illustrative in nature and notlimiting to the scope of the disclosure.

The various embodiments further can be implemented in a wide variety ofoperating environments, which in some cases can include one or more usercomputers, computing devices or processing devices which can be used tooperate any of a number of applications. User or client devices caninclude any of a number of computers, such as desktop, laptop, or tabletcomputers running a standard operating system, as well as cellular,wireless, and handheld devices running mobile software and capable ofsupporting a number of networking and messaging protocols. Such a systemalso can include a number of workstations running any of a variety ofcommercially-available operating systems and other known applicationsfor purposes such as development and database management. These devicesalso can include other electronic devices, such as dummy terminals,thin-clients, gaming systems, and other devices capable of communicatingvia a network. These devices also can include virtual devices such asvirtual machines, hypervisors, and other virtual devices capable ofcommunicating via a network.

Various embodiments of the present disclosure utilize at least onenetwork that would be familiar to those skilled in the art forsupporting communications using any of a variety ofcommercially-available protocols, such as Transmission ControlProtocol/Internet Protocol (“TCP/IP”), User Datagram Protocol (“UDP”),protocols operating in various layers of the Open System Interconnection(“OSI”) model, File Transfer Protocol (“FTP”), Universal Plug and Play(“UpnP”), Network File System (“NFS”), Common Internet File System(“CIFS”), and AppleTalk. The network can be, for example, a local areanetwork, a wide-area network, a virtual private network, the Internet,an intranet, an extranet, a public switched telephone network, aninfrared network, a wireless network, a satellite network, and anycombination thereof. In some embodiments, connection-oriented protocolsmay be used to communicate between network endpoints.Connection-oriented protocols (sometimes called connection-basedprotocols) are capable of transmitting data in an ordered stream.Connection-oriented protocols can be reliable or unreliable. Forexample, the TCP protocol is a reliable connection-oriented protocol.Asynchronous Transfer Mode (“ATM”) and Frame Relay are unreliableconnection-oriented protocols. Connection-oriented protocols are incontrast to packet-oriented protocols such as UDP that transmit packetswithout a guaranteed ordering.

In embodiments utilizing a web server, the web server can run any of avariety of server or mid-tier applications, including Hypertext TransferProtocol (“HTTP”) servers, FTP servers, Common Gateway Interface (“CGI”)servers, data servers, Java servers, Apache servers, and businessapplication servers. The server(s) also may be capable of executingprograms or scripts in response to requests from user devices, such asby executing one or more web applications that may be implemented as oneor more scripts or programs written in any programming language, such asJava®, C, C# or C++, or any scripting language, such as Ruby, PHP, Perl,Python or TCL, as well as combinations thereof. The server(s) may alsoinclude database servers, including without limitation thosecommercially available from Oracle®, Microsoft®, Sybase and IBM® as wellas open-source servers such as MySQL, Postgres, SQLite, MongoDB, and anyother server capable of storing, retrieving, and accessing structured orunstructured data. Database servers may include table-based servers,document-based servers, unstructured servers, relational servers,non-relational servers or combinations of these and/or other databaseservers.

The environment can include a variety of data stores and other memoryand storage media as discussed above. These can reside in a variety oflocations, such as on a storage medium local to (and/or resident in) oneor more of the computers or remote from any or all of the computersacross the network. In a particular set of embodiments, the informationmay reside in a storage-area network (“SAN”) familiar to those skilledin the art. Similarly, any necessary files for performing the functionsattributed to the computers, servers or other network devices may bestored locally and/or remotely, as appropriate. Where a system includescomputerized devices, each such device can include hardware elementsthat may be electrically coupled via a bus, the elements including, forexample, at least one central processing unit (“CPU” or “processor”), atleast one input device (e.g., a mouse, keyboard, controller, touchscreen or keypad) and at least one output device (e.g., a displaydevice, printer or speaker). Such a system may also include one or morestorage devices, such as disk drives, optical storage devices andsolid-state storage devices such as random access memory (“RAM”) orread-only memory (“ROM”), as well as removable media devices, memorycards, flash cards, etc.

Such devices also can include a computer-readable storage media reader,a communications device (e.g., a modem, a network card (wireless orwired), an infrared communication device, etc.), and working memory asdescribed above. The computer-readable storage media reader can beconnected with, or configured to receive, a computer-readable storagemedium, representing remote, local, fixed, and/or removable storagedevices as well as storage media for temporarily and/or more permanentlycontaining, storing, transmitting, and retrieving computer-readableinformation. The system and various devices also typically will includea number of software applications, modules, services, or other elementslocated within at least one working memory device, including anoperating system and application programs, such as a client applicationor web browser. In addition, customized hardware might also be usedand/or particular elements might be implemented in hardware, software(including portable software, such as applets), or both. Further,connection to other computing devices such as network input/outputdevices may be employed.

Storage media and computer readable media for containing code, orportions of code, can include any appropriate media known or used in theart, including storage media and communication media, such as, but notlimited to, volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage and/or transmissionof information such as computer readable instructions, data structures,program modules or other data, including RAM, ROM, Electrically ErasableProgrammable Read-Only Memory (“EEPROM”), flash memory or other memorytechnology, Compact Disc Read-Only Memory (“CD-ROM”), digital versatiledisk (DVD) or other optical storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices or any othermedium which can be used to store the desired information and which canbe accessed by the system device. Based on the disclosure and teachingsprovided herein, a person of ordinary skill in the art will appreciateother ways and/or methods to implement the various embodiments.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims.

Other variations are within the spirit of the present disclosure. Thus,while the disclosed techniques are susceptible to various modificationsand alternative constructions, certain illustrated embodiments thereofare shown in the drawings and have been described above in detail. Itshould be understood, however, that there is no intention to limit theinvention to the specific form or forms disclosed, but on the contrary,the intention is to cover all modifications, alternative constructionsand equivalents falling within the spirit and scope of the invention, asdefined in the appended claims.

The use of the terms “a” and “an” and “the” and similar referents in thecontext of describing the disclosed embodiments (especially in thecontext of the following claims) are to be construed to cover both thesingular and the plural, unless otherwise indicated herein or clearlycontradicted by context. The terms “comprising,” “having,” “including,”and “containing” are to be construed as open-ended terms (i.e., meaning“including, but not limited to,”) unless otherwise noted. The term“connected,” when unmodified and referring to physical connections, isto be construed as partly or wholly contained within, attached to orjoined together, even if there is something intervening. Recitation ofranges of values herein are merely intended to serve as a shorthandmethod of referring individually to each separate value falling withinthe range, unless otherwise indicated herein and each separate value isincorporated into the specification as if it were individually recitedherein. The use of the term “set” (e.g., “a set of items”) or “subset”unless otherwise noted or contradicted by context, is to be construed asa nonempty collection comprising one or more members. Further, unlessotherwise noted or contradicted by context, the term “subset” of acorresponding set does not necessarily denote a proper subset of thecorresponding set, but the subset and the corresponding set may beequal.

Conjunctive language, such as phrases of the form “at least one of A, B,and C,” or “at least one of A, B and C,” unless specifically statedotherwise or otherwise clearly contradicted by context, is otherwiseunderstood with the context as used in general to present that an item,term, etc., may be either A or B or C, or any nonempty subset of the setof A and B and C. For instance, in the illustrative example of a sethaving three members, the conjunctive phrases “at least one of A, B, andC” and “at least one of A, B and C” refer to any of the following sets:{A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C}. Thus, such conjunctivelanguage is not generally intended to imply that certain embodimentsrequire at least one of A, at least one of B and at least one of C eachto be present.

Operations of processes described herein can be performed in anysuitable order unless otherwise indicated herein or otherwise clearlycontradicted by context. Processes described herein (or variationsand/or combinations thereof) may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs or one or more applications) executing collectively onone or more processors, by hardware or combinations thereof. The codemay be stored on a computer-readable storage medium, for example, in theform of a computer program comprising a plurality of instructionsexecutable by one or more processors. The computer-readable storagemedium may be non-transitory. In some embodiments, the code is stored onset of one or more non-transitory computer-readable storage media havingstored thereon executable instructions that, when executed (i.e., as aresult of being executed) by one or more processors of a computersystem, cause the computer system to perform operations describedherein. The set of non-transitory computer-readable storage media maycomprise multiple non-transitory computer-readable storage media and oneor more of individual non-transitory storage media of the multiplenon-transitory computer-readable storage media may lack all of the codewhile the multiple non-transitory computer-readable storage mediacollectively store all of the code.

Accordingly, in some examples, computer systems are configured toimplement one or more services that singly or collectively performoperations of processes described herein. Such computer systems may, forinstance, be configured with applicable hardware and/or software thatenable the performance of the operations. Further, computer systems thatimplement various embodiments of the present disclosure may, in someexamples, be single devices and, in other examples, be distributedcomputer systems comprising multiple devices that operate differentlysuch that the distributed computer system performs the operationsdescribed herein and such that a single device may not perform alloperations.

The use of any and all examples, or exemplary language (e.g., “such as”)provided herein, is intended merely to better illuminate embodiments ofthe invention and does not pose a limitation on the scope of theinvention unless otherwise claimed. No language in the specificationshould be construed as indicating any non-claimed element as essentialto the practice of the invention.

Embodiments of this disclosure are described herein, including the bestmode known to the inventors for carrying out the invention. Variationsof those embodiments may become apparent to those of ordinary skill inthe art upon reading the foregoing description. The inventors expectskilled artisans to employ such variations as appropriate and theinventors intend for embodiments of the present disclosure to bepracticed otherwise than as specifically described herein. Accordingly,the scope of the present disclosure includes all modifications andequivalents of the subject matter recited in the claims appended heretoas permitted by applicable law. Moreover, any combination of theabove-described elements in all possible variations thereof isencompassed by the scope of the present disclosure unless otherwiseindicated herein or otherwise clearly contradicted by context.

All references, including publications, patent applications, andpatents, cited herein are hereby incorporated by reference to the sameextent as if each reference were individually and specifically indicatedto be incorporated by reference and were set forth in its entiretyherein.

What is claimed is:
 1. A computer-implemented method, comprising: underthe control of one or more computer systems configured with executableinstructions, configuring a data storage system to at least: apportionat least a first bundle of redundancy coded shards and a second bundleof redundancy coded shards between a plurality of data transfer devicesprovisioned by the data storage system to be capable of processing datastorage requests and data retrieval requests without a networkconnection between the plurality of data transfer devices and the datastorage system, the first bundle including at least a first identityshard, a second identity shard, and a first derived shard, the firstbundle being configured such that a first quorum quantity of shards ofthe first bundle is sufficient to reconstruct, using a redundancy code,original data associated with the first identity shard, the secondbundle including the second identity shard, a second derived shard, anda third identity shard, the second bundle being configured such that asecond quorum quantity of shards of the second bundle is sufficient toreconstruct, using the redundancy code, the second identity shard, thefirst bundle and second bundle overlapping by virtue of both includingthe second identity shard; and configure a fill pattern such that thefirst identity shard, the second identity shard, and the third identityshard are subject to receiving data for storage in a specified ordercomprising, sequentially, the first identity shard, the second identityshard, and the third identity shard; monitoring the plurality of datatransfer devices to detect an event associated with the first identityshard that indicates an inability to accept additional data; and if theevent is detected, at least: configuring any data storage requests tostore associated data in the second identity shard; initiating aningestion process of the data storage system to transfer, by a datatransfer device of the plurality of data transfer devices, dataassociated with the first identity shard to durable storage of the datastorage system; verifying that the data associated with the firstidentity shard is durably stored in the data storage system; and ifverified that the data associated with the first identity shard isdurably stored, at least: deleting the first identity shard and thefirst derived shard; generating a third bundle comprising a fourthidentity shard, the third identity shard, and a third derived shard, thethird bundle overlapping with the second bundle by virtue of sharing thethird identity shard; and adding the fourth identity shard to thespecified order of the fill pattern after the third identity shard. 2.The computer-implemented method of claim 1, wherein each of the shardsof the first bundle and the second bundle are stored on a different datatransfer device of the plurality of data transfer devices.
 3. Thecomputer-implemented method of claim 1, wherein at least two shards ofthe first bundle or the second bundle are stored on a data transferdevice of the plurality of data transfer devices.
 4. Thecomputer-implemented method of claim 1, wherein at least one of the datatransfer devices of the plurality of data transfer devices is reused foreither the fourth identity shard or the third derived shard after eitherthe first identity shard or the first derived shard is deletedtherefrom.
 5. The computer-implemented method of claim 4, wherein theplurality of data transfer devices is monitored by an entity of theplurality of data transfer devices to detect the event.
 6. Thecomputer-implemented method of claim 1, wherein the ingestion process isinitiated in connection with physical co-location of the data transferdevice with the data storage system.
 7. The computer-implemented methodof claim 1, wherein the ingestion process is initiated such that thedata associated with the first identity shard is transferred from theplurality of data transfer devices to the data storage system over anetwork between at least one of the data transfer devices and the datastorage system.
 8. The computer-implemented method of claim 1, whereinthe first quorum quantity includes all shards in the first bundle otherthan the first identity shard.
 9. The computer-implemented method ofclaim 1, wherein the redundancy code is a linear erasure code.
 10. Asystem, comprising at least one computing device configured to implementone or more services, wherein the one or more services are configuredto: provision a first plurality of data transfer devices to store atleast a first bundle of redundancy coded shards and a second bundle ofredundancy coded shards, the first bundle including at least a firstidentity shard, a second identity shard, and a first derived shard, thefirst bundle being configured such that a first quorum quantity ofshards of the first bundle is sufficient to reconstruct, using aredundancy code, original data associated with the first identity shard,the second bundle including the second identity shard, a second derivedshard, and a third identity shard, the second bundle being configuredsuch that a second quorum quantity of shards of the second bundle issufficient to reconstruct, using the redundancy code, the secondidentity shard, the first bundle and second bundle overlapping by virtueof both including the second identity shard; monitor data transferdevices associated with the first bundle for an event associated withthe first identity shard; and if the event is detected, at least: storedata associated with data storage requests received after the event on adata transfer device associated with the second identity shard; transferdata associated with the first identity shard from the data transferdevice to a data storage system; delete the first identity shard and thefirst derived shard; and provision a second plurality of data transferdevices to store a third bundle comprising a fourth identity shard, thethird identity shard, and a third derived shard, the third bundleoverlapping with the second bundle by virtue of sharing the thirdidentity shard.
 11. The system of claim 10, wherein the one or moreservices are further configured to: in response to a retrieval requestfor a set of data associated with the first identity shard after theevent: if the first identity shard is available, service the retrievalrequest by at least retrieving the set of data from the first identityshard; and if the first identity shard is unavailable, service theretrieval request by regenerating the set of data, using the redundancycode, from the quorum quantity of other shards of the first bundle. 12.The system of claim 11, wherein if the transfer of the data has takenplace and the first identity shard and the first derived shard have beendeleted, service the retrieval request using the data storage system.13. The system of claim 10, wherein the data storage system is a datatransfer device outside of the first plurality or the second pluralityof data transfer devices.
 14. The system of claim 10, wherein the datastorage system is an archival data storage service.
 15. The system ofclaim 10, wherein the second plurality of data transfer devicescomprises the data transfer devices on which the first identity shardand the first derived shard were stored prior to deletion.
 16. Anon-transitory computer-readable storage medium having stored thereonexecutable instructions that, as a result of being executed by one ormore processors of a computer system, cause the computer system to atleast: generate at least a first bundle of redundancy coded shards and asecond bundle of redundancy coded shards, the first bundle including atleast a first identity shard, a second identity shard, and a firstderived shard, the first bundle being configured such that a firstquorum quantity of shards of the first bundle is sufficient toreconstruct, using a first redundancy code, original data associatedwith the first identity shard, the second bundle including the secondidentity shard, a second derived shard, and a third identity shard, thesecond bundle being configured such that a second quorum quantity ofshards of the second bundle is sufficient to reconstruct, using a secondredundancy code, the second identity shard, the first bundle and secondbundle overlapping by virtue of both including the second identityshard; monitor the first bundle for an event associated with the firstidentity shard; and if the event is detected, at least: store dataassociated with data storage requests received after the event in thesecond identity shard; transfer data associated with the first identityshard to a data storage system; delete the first identity shard and thefirst derived shard; and generate a third bundle comprising a fourthidentity shard, the third identity shard, and a third derived shard, thethird bundle overlapping with the second bundle by virtue of sharing thethird identity shard.
 17. The non-transitory computer-readable storagemedium of claim 16, wherein the first redundancy code and the secondredundancy code are the same redundancy code.
 18. The non-transitorycomputer-readable storage medium of claim 16, wherein the instructionsfurther cause the computer system to transfer the data from the firstidentity shard to the data storage system over the Internet.
 19. Thenon-transitory computer-readable storage medium of claim 16, wherein thedata storage system is not physically co-located with the computersystem.
 20. The non-transitory computer-readable storage medium of claim16, wherein the first redundancy code and the second redundancy code arelinear erasure codes.